25 research outputs found
Visual object tracking performance measures revisited
The problem of visual tracking evaluation is sporting a large variety of
performance measures, and largely suffers from lack of consensus about which
measures should be used in experiments. This makes the cross-paper tracker
comparison difficult. Furthermore, as some measures may be less effective than
others, the tracking results may be skewed or biased towards particular
tracking aspects. In this paper we revisit the popular performance measures and
tracker performance visualizations and analyze them theoretically and
experimentally. We show that several measures are equivalent from the point of
information they provide for tracker comparison and, crucially, that some are
more brittle than the others. Based on our analysis we narrow down the set of
potential measures to only two complementary ones, describing accuracy and
robustness, thus pushing towards homogenization of the tracker evaluation
methodology. These two measures can be intuitively interpreted and visualized
and have been employed by the recent Visual Object Tracking (VOT) challenges as
the foundation for the evaluation methodology
Beyond standard benchmarks: Parameterizing performance evaluation in visual object tracking
Object-to-camera motion produces a variety of apparent motion patterns that
significantly affect performance of short-term visual trackers. Despite being
crucial for designing robust trackers, their influence is poorly explored in
standard benchmarks due to weakly defined, biased and overlapping attribute
annotations. In this paper we propose to go beyond pre-recorded benchmarks with
post-hoc annotations by presenting an approach that utilizes omnidirectional
videos to generate realistic, consistently annotated, short-term tracking
scenarios with exactly parameterized motion patterns. We have created an
evaluation system, constructed a fully annotated dataset of omnidirectional
videos and the generators for typical motion patterns. We provide an in-depth
analysis of major tracking paradigms which is complementary to the standard
benchmarks and confirms the expressiveness of our evaluation approach
A hierarchical adaptive model for robust short-term visual tracking
Visual tracking is a topic in computer vision with applications in many emerging as well as established technological areas, such as robotics, video surveillance, human-computer interaction, autonomous vehicles, and sport analytics. The main question of visual tracking is how to design an algorithm (visual tracker) that determines the state of one or more objects in a stream of images by accounting for their sequential nature. In this doctoral thesis we address two important topics in single-target short-term visual tracking. The first topic is related to construction of an object appearance model for visual tracking. The modeling and updating of the appearance model is crucial for successful tracking. We introduce a hierarchical appearance model which structures object appearance in multiple layers. The bottom layer contains the most specific information and each higher layer models the appearance information in a more general way. The hierarchical relations are also reflected in the update process where the higher layers guide the lower layers in their update while the lower layers provide a source for adaptation to higher layers if their information is reliable. The benefits of hierarchical appearance models are demonstrated with two implementations, primarily designed to tackle tracking of non-rigid and articulated objects that present a challenge for many existing trackers. The first example of appearance model combines local and global visual information in a coupled-layer appearance model. The bottom layer contains a part-based appearance description that is able to adapt to the geometrical deformations of non-rigid targets and the top layer is a multi-modal global object appearance model that guides the model during object appearance changes. The experimental evaluation shows that the proposed coupled-layer appearance model excels in robustness despite the fact that is uses relatively simple appearance descriptors. Our evaluation also exposed several weaknesses that were reflected in a decreased accuracy. Our second presented appearance model extends the hierarchy by introducing the third layer and a concept of template anchors. The first two layers are conceptually similar to the original two-layer appearance model, while the third layer is a memory system that is composed of static templates that provide a strong spatial cue when one of the templates is matched to the image reliably, thus assisting in quick recovery of the entire appearance model. In the experimental evaluation we show that this addition indeed improves the accuracy, as well as the overall performance of a tracker.
The second question that we are addressing is the performance evaluation of single-target short-term visual tracking algorithms. In contrast to the dominant trend in the past decades, we claim that visual tracking is a complex process and that the performance of visual trackers cannot be reduced to a single performance measure, nor should it be described by an arbitrary set of measures where the relationship between measures is not well understood. In our research we investigate performance measures that are traditionally used in performance evaluation of single-target short-term visual trackers, through theoretical and empirical analysis, and show that some of them are measuring the same aspect of tracking performance. Based on our analysis we propose a pair of two weakly correlated measures to measure the accuracy and robustness of a tracker, propose a visualization of the results as well as the analysis of the entire methodology using the theoretical trackers that exhibit extreme tracking behaviors. This is followed by an extension of the methodology on ranking of multiple trackers where we also take into account the potentially stochastic nature of visual trackers and test the statistical significance of performance differences. To support the proposed evaluation methodology we have developed an open-source software tool that implements the methodology and a simple communication protocol that enables a straightforward integration of trackers. The proposed evaluation methodology and the evaluation system have been adopted by several Visual Object Tracking (VOT) challenges
The Ninth Visual Object Tracking VOT2021 Challenge Results
acceptedVersionPeer reviewe
Visual tracking of non-rigid objects
In this thesis we study the field of visual tracking of non-rigid, articulated objects. For this
task a typical visual model, used mostly for the description of rigid objects has to be extended
in a way that it can adapt to the deformations of such objects. In our work we present
an extension that is based on a hierarchical approach towards visual model construction. It
is based on a hierarchical combination of local and global visual information. The resulting
visual model extends the existing visual models that use a set of local features connected
with geometrical constraints. This set represents the bottom layer of the presented visual
model. Using local features the visual model builds a multi-modal representation of the
object’s appearance that represents the top layer of the model. Based on this information an
area of the object in a frame is determined, and based on this area, the local feature set is
updated with new features. In the thesis our work is first placed into a research context by
describing recent published work on visual tracking of non-rigid objects. Next, the proposed
visual model is described in detail together with its integration in a simple tracker. The
performance of the tracker is assessed in various experiments using nine different video
sequences. Advantages and disadvantages of the tracker are shown in comparison of the
tracker with three different state-of-the-art visual trackers. The thesis is concluded with a
discussion in which some theoretical and practical limitations of the presented visual model
are laid out as well as some ideas for further development